26 research outputs found

    Models of binaural hearing for sound lateralisation and localisation

    No full text
    The current study suggests two models of binaural hearing, which aim to make predictions for inside- and outside-head localisation of a single sound source in the horizontal plane. Both models consider free-field ITDs and ILDs as the memory of sound localisation to which the target interaural disparity is compared. The first model, the characteristic-curve (CC) model acquires the best estimate of a source location by finding the nearest-neighbour of the target ITD and ILD in the characteristic curve of free-field interaural disparities. On the other hand, the second model, the pattern-matching (PM) model, assumes that the excitation-inhibition cell activity pattern suggested by Breebaart et al. [J. Acoust. S. Am., 110(2):1074-1088, 2001] provides the internal representation of the sound localisation cues. Given the uniqueness of EI-patterns, the pattern-matching process operates in each auditory frequency band to give an estimate of the sound source position, which is then frequency-weighted to finally establish the probability function of target location. In the two listening tests presented in the current study, it has been found that both models are capable of predicting many important features of human sound localisation. For example, the inside-head localisation (laterality) of dichotic pure tones has been reasonably well predicted at low source frequencies, 600 Hz and 1200 Hz, by the CC model individualised for each participant. In addition, the prediction of the PM model has been successfully compared to listening test results where the outside-head localisation of the participants was investigated for real and virtual acoustic sources. Given the simplicity and the originality in modelling the central processes of auditory spatial hearing, particularly in handling the ILD information of binaural signals, the predictive scope of the models is regarded as being worthy of further investigation. Furthermore, considering the reasonable predictions made for both lateralisation and localisation of acoustic stimuli, the models developed appear also to be well-suited to the computational evaluation of spatial audio systems

    Extraction of voice from the center of the stereo image

    No full text
    Detection and extraction of the center vocal source is important for many audio format conversion and manipulation applications. First, we study some generic properties of stereo signals containing sources panned exactly to the center of the stereo image and propose an algorithm for the separation of a stereo audio signal into a center and side channels. Recently, Park et al. [Proc. 129th AES convention, London 2010, Preprint Paper 8071] presented the results of listening tests where the perceived widths of the stereo images were evaluated for synthetic signals. Given the center separation algorithm proposed in this paper, a similar experiment was carried out with realistic stereo audio contents. The results show that there are clear differences between the stimuli used in the two experiments, which are discussed in this paper based on the analysis of the test signals and their binaural characteristics in the listening test configuration.</p

    Pattern-matching analysis of fine echo delays by the spectrogram correlation and transformation receiver

    No full text
    Among a few previous attempts to model the outstanding echolocation capability of bats, the workby Saillant et al. J. Acoust. Soc. Am. 94, 2691–2712 1993 is, arguably, one of the mostfrequently referenced studies in which the predictions of spectrogram correlation and transformationSCAT model were compared to the results of relevant behavioral experiments. The SCAT modelconsists of cochlear, spectrogram correlation and spectrogram transformation blocks, where thelatter two processes estimate the overall and the fine time delays between the animal’s call and theechoes, given the neural representation of the acoustic signals generated by the cochlear block. Thispaper first provides a rigorous account of the spectrogram transformation ST block. Byapproximating the neural signals in analytic forms, many aspects of the ST block are explained anddiscussed in relation to the predictive scope of the model. Furthermore, based on these analyticalarguments, the ST block is investigated from a different point of view, interpreted as apattern-matching process which may operate at the high level of the animal’s auditory pathway

    Extraction of voice from the center of the stereo image

    No full text
    Detection and extraction of the center vocal source is important for many audio format conversion and manipulation applications. First, we study some generic properties of stereo signals containing sources panned exactly to the center of the stereo image and propose an algorithm for the separation of a stereo audio signal into a center and side channels. Recently, Park et al. [Proc. 129th AES convention, London 2010, Preprint Paper 8071] presented the results of listening tests where the perceived widths of the stereo images were evaluated for synthetic signals. Given the center separation algorithm proposed in this paper, a similar experiment was carried out with realistic stereo audio contents. The results show that there are clear differences between the stimuli used in the two experiments, which are discussed in this paper based on the analysis of the test signals and their binaural characteristics in the listening test configuration.</p

    Data-driven modeling of the spatial sound experience

    No full text
    Since the evaluation of audio systems or processing schemes is time-consuming and resource-expensive, alternative objective evaluation methods attracted considerable research interests. However, current perceptual models are not yet capable of replacing a human listener especially when the test stimulus is complex, for example, a sound scene consisting of time-varying multiple acoustic images. This paper describes a data-driven approach to develop a model to predict the subjective evaluation of complex acoustic scenes, where the extensive set of listening test results collected in the latest MPEG-H 3D audio initiative was used as training data. The results showed that a few selected outputs of various auditory models may be a useful set of features, where linear regression and multilayer perceptron models reasonably predicted the overall distribution of listening test scores, estimating both mean and variance

    An auditory process model for sound localization

    No full text
    An auditory process model for sound localization in the horizontal plane is presented in this paper. Based on equalization-cancellation (EC) theory, the binaural processor produces excitation-inhibition (EI) cell activity patterns at each frequency band, which are, then, combined by the central processor employing simple template-matching method. Gain and delay errors have been introduced at the end of the peripheral process in order to accommodate the imperfection possibly present in human EC process. These parameters have been adjusted to fit the model performance to that of human listeners described in a few published subjective experiments. A certain value of the gain error has been found to give an acceptable model prediction in terms of the mean error and standard deviatio

    Evaluation of stereophonic images with listening tests and model simulations

    No full text
    A binaural hearing model has recently been suggested for the evaluation of the performance of virtual acoustic imaging systems. The model considers excitation-inhibition (EI) cell activity patterns as the internal representation of sound localisation cues, and a pattern-matching procedure with a frequency-weighting scheme produces the estimate of source location in the horizontal plane. Given the reasonable prediction of some important features in human sound localisation and lateralisation, this paper presents a further verification and application of the model in actual listening tests. In this work, participants' responses to stereophonic images have been compared with the predictions of the model, individually established from the subject's own HRTF. Model predictions have been found to be both qualitatively and quantitatively consistent with the test results, and in particular, the agreement between 2 and 3kHz gave a good indication that, unlike some similar models, the current model can effectively incorporate both ITD and ILD information according to their relative importance.</p

    A model of sound localisation applied to the evaluation of systems for stereophony

    No full text
    In this paper, a model of human sound localisation is described, and its prediction is compared to the results of listening tests. The model takes binaural signals as the input, processing them in a series of signal processing modules, which simulate the peripheral, binaural and the central stages of spatial hearing. In particular, the central processor of the model considers the excitation-inhibition (EI) cell activity patterns as the internal representation of available cues, and the source location estimates are obtained by using a simple pattern-matching procedure. In the listening tests, stereophonic images were presented to the listener's front, where the stimulus was either broadband or 1/3 octave band noise at 7 centre frequencies from 0.5 kHz to 6 kHz. The subjective responses compared well to the model prediction across frequency except for some cases where the image location was overestimated. Also, the prediction for the localisation of broadband phantom images agreed well with the test results, where the model prediction was integrated across frequency according to a tentatively suggested weighting function. Although the neuroscientific background is weak for the model, the good agreement with the subjective responses suggests that the model is worth investigating further
    corecore